-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix guided decoding with tokenizer mode mistral #11046
[Bugfix] Fix guided decoding with tokenizer mode mistral #11046
Conversation
Signed-off-by: Wallas Santos <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this and trying to clean up along the way. My main concern is with using the tokenizer's vocab size - I think this may not be padded to the model's final logit size. Also the backend_str/vocab_type logic is a little confusing now.
vocab_size=model_config.hf_config.vocab_size, | ||
vocab_size=tokenizer.vocab_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarnphm there is a reason why we needed to reference the model's vocab size and not the tokenizers, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for check it out @mgoin! I really appreciate how fast you answered to this.
Also the backend_str/vocab_type logic is a little confusing now.
Yeah, but the problem of this part is it was not correct. I reported a traceback on #11045.
from_huggingface(): incompatible function arguments. The following argument types are supported: (arg0: list[str], arg1: str, arg2: Optional[int], arg3: Optional[list[int]]) -> xgrammar.xgrammar_bindings.TokenizerInfo
First, backend_str
is typed with str and was assigned with a Enum value (VocabType). And this function from_huggingface
does not receive vocabType, and that's why it crashes using Mistral Tokenizer and possibly other types of tokenizers as well. So, for these cases we have call the contructor of TokenizerInfo
and pass a vocabType (and no backend_str), which may be RAW
for tokenizer like tikToken and BYTE_FALLBACK
for mistral. I think maybe I can make it more explicit, first time I wrote a solution for that I added a bool field on TokenizerData
like is_from_huggingface
, but I decided to make it more concise and infer that based on the value of these fields (which must be mutually exclusive). Of course I can add more comments to clarify and add asserts to prevent wrong behavior.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to reference tokenizer vocab size because of additional padding tokens.
this is the thread https://vllm-dev.slack.com/archives/C07QQ8DAXMK/p1732673561777159
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tks! I reverted the code.
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few finals requests from my end, otherwise LGTM
@@ -115,25 +116,9 @@ async def get_guided_decoding_logits_processor( | |||
def get_local_guided_decoding_logits_processor( | |||
guided_params: GuidedDecodingParams, tokenizer: PreTrainedTokenizer, | |||
model_config: ModelConfig) -> LogitsProcessor | None: | |||
guided_params = maybe_backend_fallback(guided_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you revert this change?
This is being used for offline use case, with LLM, where as get_guided_decoding_logit_processor is being used for online usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed what I did and checked that it was not so good based on the difference of implementation of the methods get_local_outlines_guided_decoding_logits_processor
and get_outlines_guided_decoding_logits_processor
. But I tried something a little bit difference to not revert everything, just to avoid code duplication. See if you agree, if not I won't insist I can revert with no problem. Also I updated the tests to check the offline and online version to pass all over these code paths, considering the offline path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is even more confusing now that there are three functions. I would prefer a revert as it seems you have no other changes to this file? We can consider refactor in another PR
Signed-off-by: Wallas Santos <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Wallas Santos <[email protected]>
# These fields are mutually exclusive: `backend_str` is used to create a | ||
# TokenizeInfo with `TokenizerInfo.from_huggingface` while `vocab_type` is | ||
# used within the constructor of TokenizeInfo | ||
backend_str: str | None = None | ||
vocab_type: xgr.VocabType | None = None | ||
|
||
def __post_init__(self): | ||
# Check for mutual exclusive | ||
assert not (self.backend_str and self.vocab_type), \ | ||
"backend_str and vocab_type are mutual exclusive" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is it possible to just either use backend_str or vocab_type here?
the cartesian product is probably harder to document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is it possible to just either use backend_str or vocab_type here?
The way get_tokenizer_data
was implemented, I don't see another way. The method from_hugging_face
from xgrammar have several paths, and depending the type of the tokenizer it needs backend_str or vocab_type.
the cartesian product is probably harder to document.
I'm sorry, what you mean by cartesian product? Can you elaborate more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two mutual exclusive options, which will have different combinations across different multiples.
Considering how much models vLLM is currently supported, in this case, we will have to draw the table of what combinations work/doesn't work, hence the cartesian product of combinations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got it. Do you think documentation is required for this PR? My primary goal was to address the issue with guided decoding when using tokenizer_mode mistral, as mentioned in the related PR. Without this fix, vLLM crashes for xgrammar due to an unhandled code path.
In my opinion, it might be beneficial to apply these fixes and then assess whether extensive documentation or code checks are required to help users understand unsupported features. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice I think this LGTM once you remove the xgrammar import in the test and revert vllm/model_executor/guided_decoding/__init__.py
from vllm.model_executor.guided_decoding.outlines_logits_processors import ( | ||
JSONLogitsProcessor, RegexLogitsProcessor) | ||
from vllm.model_executor.guided_decoding.xgrammar_decoding import TokenizerData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing from xgrammar_decoding.py
will also import+require xgrammar, in addition to your import xgrammar as xgr
line below. I would recommend making your test_pickle_xgrammar_tokenizer_data function skip if xgrammar isn't able to be imported, and simply including these two imports within that function.
In fact it may be best to add a dedicated xgrammar testing file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change but you need to move from vllm.model_executor.guided_decoding.xgrammar_decoding import TokenizerData
to the try-except as well
@@ -115,25 +116,9 @@ async def get_guided_decoding_logits_processor( | |||
def get_local_guided_decoding_logits_processor( | |||
guided_params: GuidedDecodingParams, tokenizer: PreTrainedTokenizer, | |||
model_config: ModelConfig) -> LogitsProcessor | None: | |||
guided_params = maybe_backend_fallback(guided_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is even more confusing now that there are three functions. I would prefer a revert as it seems you have no other changes to this file? We can consider refactor in another PR
Signed-off-by: Wallas Santos <[email protected]>
Thanks for the review @mgoin @aarnphm.
NP, It made more sense to me when I tried to remove the model_config.
Added the conditional skip
Agree, I added a TODO, I guess it make sense to do that in another PR. |
Signed-off-by: Wallas Santos <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice LGTM!
It appears that AMD CI isn't using the correct version of |
Hey @DarkLight1337 tks for checking this. I did a quick search and I think there is a conflict with this |
I checked the image of the CI for AMD and found this:
In my environment I have
Not sure the best way to fix besides set it in requirements... Any suggestion? |
I guess you just have to update the requirements file. |
Signed-off-by: Wallas Santos <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Wallas Santos <[email protected]>
@DarkLight1337 updated it. Let's see what CI says now. |
…t#11046) Signed-off-by: Sage Moore <[email protected]>
This PR address the issues on #11045.
Changelog:
test_guided_processors
and added then to CI which were not executing to validate changesFIX #11045 (link existing issues this PR will resolve)